Wikipedia and Machine Translation: killing two birds with one stone

نویسندگان

  • Iñaki Alegria
  • Unai Cabezon
  • Unai Fernandez
  • Gorka Labaka
  • Arkaitz Zubiaga
چکیده

In this paper we present the free/open-source language resources for machine translation created in OpenMT-2 wikiproject, a collaboration framework that was tested with editors of Basque Wikipedia. Post-editing of Computer Science articles has been used to improve the output of a Spanish to Basque MT system called Matxin. For the collaboration between editors and researchers, we selected a set of 100 articles from the Spanish Wikipedia. These articles would then be used as the source texts to be translated into Basque using the MT engine. A group of volunteers from Basque Wikipedia reviewed and corrected the raw MT translations. This collaboration ultimately produced two main benefits: (i) the change logs that would potentially help improve the MT engine by using an automated statistical post-editing system, and (ii) the growth of Basque Wikipedia. The results show that this process can improve the accuracy of a Rule Based Machine Translation system in nearly 10% benefiting from the post-edition of 50,000 words in the Computer Science domain. We believe that our conclusions can be extended to MT engines involving other less-resourced languages lacking large parallel corpora or frequently updated lexical knowledge, as well as to other domains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New Insights on the “DC Shock-Reperfusion” in ST Elevation Myocardial Infarction: Killing Two Birds with One Stone?

Cite this article as: Özdamar Ü, Akboğa MK, Bayraktar MF, Özeke Ö. New Insights on the "DC Shock-Reperfusion" in ST Elevation Myocardial Infarction: Killing Two Birds with One Stone? Balkan Med J 2017;34:382-3 ©Copyright 2017 by Trakya University Faculty of Medicine / The Balkan Medical Journal published by Galenos Publishing House. New Insights on the “DC Shock-Reperfusion” in ST Elevation Myo...

متن کامل

Traditional IVR and visual IVR - killing two birds with one stone

This paper describes a novel solution which allows to quickly build and develop multi-channel applications. Due to the popularity of a smartphone, a new paradigm of applications called Visual IVR has been emerging recently, where visual navigation replaces the traditional DTMF or voice-enabled dialogue control. The described solution brings a unified approach for the creation of traditional IVR...

متن کامل

Supporting Multilingual Collaboration for Wikipedia Translations

In Wikipedia, the largest encyclopedia on the Internet, a huge amount of knowledge is shared among users. However, differences in the number of articles among different language versions of Wikipedia represent an important issue. In order to solve the current imbalance of knowledge present in different languages , some users translate existing articles from one language to create new articles i...

متن کامل

A General Method for Creating a Bilingual Transliteration Dictionary

Transliteration is the rendering in one language of terms from another language (and, possibly, another writing system), approximating spelling and/or phonetic equivalents between the two languages. A transliteration dictionary is a crucial resource for a variety of natural language applications, most notably machine translation. We describe a general method for creating bilingual transliterati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014